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Abstract — Use of Radial Basis Function Neural Networks to 
predict the time series of river flux data was approached via 
NARMAX methodology. By modifying the network architecture 
in search of the best performance characteristics, it was 
observed that the problem of time series analysis has chaotic 
behavior, which cannot be fully accounted for by use of such 
technology, suggesting the need for more robust machine 
learning techniques. 

Index Terms — RBF neural network, NARMAX 

methodology, time series prediction, dynamic systems 


I. INTRODUCTION 

Radial Basis Function (RBF) networks are characterized for 
their use of a unique hidden layer whose neurons’ activation 
functions belong to the class of “radial basis functions”. Such 
a layer augments the dimensionality of input data when it 
maps input to hidden space, and the network’s output is a 
weighted sum of the hidden layer activations. 

Further, the NARMAX methodology refers to a technique 
for presenting input data to a neural network - be it RBF or 
not. In the context of a network modelling a dynamical system 
whose output is y(7) - where t stands for time, NARMAX 
consists in presenting to the model a history 
{y(f-l),y(f-2),...,y(f-u)} - where t d is the maximum 

delay - of past system outputs, supposing this is enough for 
learning its dynamics. 

In the following, we present the theory underlying RBF and 
NARMAX, followed by the analysis proposed in this work. 

Highlight a section that you want to designate with a certain 
style, and then select the appropriate name on the style menu. 
The style will adjust your fonts and line spacing. Do not 
change the font sizes or line spacing to squeeze more text 
into a limited number of pages. Use italics for emphasis; do 
not underline. 

A. RBF Networks 

The architecture known as RBF consists of two layers: the 
first one has m hidden units, such that the layer’s activation 
functions are {G 1 ,...,G m } ; the second is composed of / 

neurons, each of which generates a network output, such that 
each output y. computed from input v is determined by 
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y t = Gj (x) , where {w ; ,,..., w /m } are neural weights 

7=1 

associated to the i -th output neuron. 

Thus, hidden units, by increasing the dimensionality of 
input data, increase the chance of them becoming linearly 
separable [1]. In this work, we use a “bias” term w i0 in the 


output, such that we have instead y. = w i0 F^ j w..G j (x) . This 

j =i 


is equivalent to considering an extra hidden unit with constant 
unitary activation function. We also restrict the analysis to 
Gaussian activation functions, those with the format: 

Cm m2 N 


G a,Xc (x) = exp 


2cr 


( 1 ) 


V y 

Where x c and cr are respectively termed centroid and 
standard deviation, both being parameters of the function. 

There are two main types of RBF networks: regularization 
networks and generalized networks. The former possess as 
many hidden units as there are training samples, and there is 
rigorous mathematical support to guarantee the existence of 
solution to the interpolation problem which they solve [2], 
however - owing to their large degree of freedom in terms of 
neural parameters [3] - they are prone to overfit, whence there 
is considerable body of literature describing how to avoid this 
tendency [4], [5]. The latter, on the other hand, align to the 
general principle of the Occam Razor [6], in that their 
centroid count is usually less than the amount of training 
samples by several orders of magnitude, thus reducing the 
likelihood of overfit. From here on, we take “RBF network” 
to mean “generalized RBF network”. 

Several training techniques are known [7], [8]. One of the 
simplest is K-means [9], [10] for centroid coordinate 
computation, and pseudoinverse method for neural weight 
computation. 

K-means attemps to define hidden unit centroids 


{t x ,...J m } according to the geometry of samples in the 
training set 7 = {jq,...,jc^}, where N = |/| is the training set 


size and each x t is a training sample. To this effect, over a 
partition P = P m } of / , we define the “intra-group 


sum of squares”: 

m _ ( j 

L ( F )=ZZ X ~\^\L U 

7 = 1 t(= P. 1 uc=P 


\2 


( 2 ) 


Centroids are computed by finding P which minimizes 
L(P), then defining each t { as the arithmetical average of all 


elements belonging to P .. Since this minimization problem is 
NP-hard, usually heuristics are applied, which naturally may 
lead to local optima. 


31 


www.erpublication.org 







River Flux Prediction Via RBF Neural Networks And NARMAX Methodology 


Meanwhile, the pseudoinverse method is employed to 
compute output-layer weights after centroids and 
standard-deviations have been fixed. In such a context, and 
considering a scalar-output network (generalizing to 
multidimensional output is immediate), the method defines a 


cost function: 
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Where 

y. is the expected output for each input jc. 

Minimizing 

; C leads 

to: 
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Where we assume the following notations: 
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B. NARMAX methodology 

For a variety of natural phenomena, physical laws are 
known which can describe and predict their evolution 
quantitatively. When this is the case, physics offers a 
mathematical model through which the dynamical system 
may be studied. However, there are complex systems for 
which either no structure is known or it is too complex to be 
completely treated computationally. This situation motivates 
the use of empirical formulations in an attempt to analyse the 
problem, one of them being NARMAX [11]. 

In NARMAX formulation, a discrete system with output y 
and input u is described by [12]: 

}+) = F (y(t •• ,y(t ~n y ),u(t -1),” -,u(t -«„)) (8) 

Where F is an arbitrary non-linear function, n y is the 
maximum output delay and n u is the maximum input delay. 
Thus, (8) is an approximation to the system under scrutiny. 

C. Problem proposal 

This work employs and RBF network coupled with 
NARMAX methodology to predict the flux of Ganges river 
(MG/Brasil) given the history of its flux in the past. The 
solution is implemented in MATLAB, and the combination of 
RBF and NARMAX has precedents in the literature [13]. 

In possession of a database registering monthly flux of 
Camargos and Furnas hydroelectric plants (built along the 
river) during a 82 year interval, we intended to have the 
network learn the river’s dynamics. To this effect, we 
architected the input, hidden and output layers of the model, 
defining - for instance - the maximum delay in presenting 
flux data to the network, the quantity of hidden units, and the 
training algorithm to be used - provided more than one is 
known. 

Varying the architecture choice, we compared all obtained 
results to one another, analyzing how each factor impacted 
network performance. 


II. METHODOLOGY 

Fig. 1 illustrates the network architecture. A single network 
processes both power plants’ fluxes, each with 2 delayed 
inputs. Input vector u is admitted to each of the m + 1 hidden 
units, the first of which is constant and equal to 1 (bias) while 
the remaining ones are Gaussian centered in t 1? .. .,t m with a 
common standard-deviation cr . Finally, the bidimensional 

m 

output is y(u) = w 0 +^w.G. (u) - where w. is a 

i =1 

bidimensional neural weight and G. is the radial basis 
function centered on t. - and represents the next predicted 
flux for each power plant. We adopt the notation 
W = [w 0 • • • w m ] r (m + 1 rows and 2 column) for the neural 
weights matrix. 



Figure 1. RBF network. Variables y c and yf refer to Camargos and Furnas 
plants, and boldface symbols are vectors. Parameters tu o and Wi must be 
learned by the network. 


A. Training and test sets 

Training the network means deciding on values for 
, cr and w 0 ,...w m . To this effect training 
samples were extracted from aforementioned database, in the 
following format: there were 984 flux measurements for each 
power plant, from $ (/) to ^ 984 (/) (Furnas plant) and 

from ^(c) to ^ 984 (c) (Camargos plant); for each triplet 
{$,$ +1 ,$+ 2 } , we built a sample such that the input to the 
network is u,. = {$ (/)4 +1 (/)4 (c),<j> M (c)} and the 
expected output is y. = {$ +2 (/)>$+i( c )} 5 the 10 last such 

samples were reserved for post-training evaluation of network 
performance (test set), while the remaining 
N = 982-10 = 972 constituted the effective training set. We 
adopt the notation Y = [y 1 ,...y iV ] (2 rows and N columns) 
for the matrix of expected outputs. 

B. Training algorithm 

The first pre-processing procedure was normalizing input 
and output data. This was carried out 
dimension-by-dimension, with the usual process of 
subtracting the mean and dividing by standard-deviation. 

To determine the centroids of the hidden units, we used 

K-means [10]. 
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Once the centroids had been computed, we used a heuristic 
approach to estimate cr [10]: 

(9) 

Where d ^ is the maximum centroid-to-centroid distance 
among all centroid pairs. Finally, the weights W were 
determined with the pseudoinverse method [10], according to 
which: 

W = GY r (10) 

Where G + stands for the pseudoinverse of Nxm +1 
matrix G given by: 

1 G i (uj) ••• G m (u,) 

G= 1 GiK) G m ( u 2 ) (U) 

J G,K) ••• G m (u, v )_ 

C. Architecture alternatives 

For comparison purposes, all m in the set 
{3,5,10,15,20,50} were utilized so as to evidence the 

influence of centroid quantity over network generalizability. 
Further, we searched for the centroid count (restricted to 
inveral from 2 to 50) which would minimize the squared error 
E over the test set, where E is defined by: 

£ =Z|y( u .)-y i r (i2) 

Where i loops through the test set, y (u j ) is the output of 
the network under input u ; , and y. is the expected output for 
the same input. 

Finally, we experimented varying how many delayed inputs 
the network receives: instead of 2 delayed measurements for 
each power plant (as in Fig. 1), we tried 3 or more 
measurements. 

We note that the quantity of hidden layers was not a theme 
for study because it is already known that single-hidden-layer 
RBF networks are universal function approximators [14]. 



III. RESULTS AND ANALYSIS 

First we present performances obtained by varying the 
quantity of centroids in the hidder layer in the set 
{3,5,10,15,20,50}. In what follows, we analyze the effects 

of augmenting the quantity of delayed inputs available to the 
network. 

A. Network with 3 centroids 

In Fig. 2 and Fig. 4, we observe that the network was not 
capable of capturing the higher frequencies in the time series 
of both plants, besides being clearly delayed in contrast to the 
plants’ dynamics. 

The first mentioned issue refers to the amplitudes of 
oscillation in the plants’s dynamics and in the network’s 
dynamics: the former exhibits peaks that the latter cannot 
capture in the training set. Further, network outputs in the test 
set approximate moving averages of the expected outputs 
(Fig. 3 and Fig. 5). 


On the other hand, the second mentioned issue alludes to 
the time delay between network output peaks and power plant 
peaks, visible in the training set. 


~r 



Figure 2. 3-centroid network performance on training set for Furnas plant 



Figure 3. 3-centroid network performance on test set for Furnas plant 



Figure 4. 3-centroid network performance on training set for Camargos 

plant 



Figure 5. 3-centroid network performance on test set for Camargos plant 


B. Network with 5 centroids 

We note in Fig. 6 and Fig. 8 a reduction in time delay 
between expected and predicted peaks. Furthermore, higher 
oscilation frequencies were captured by the network, which 
can be perceived by the concave outline between indexes 6 
and 12 in the test set. 

However, inflexion points in the expected outputs, such as 
as index 6 of Furnas plant (test set) were still not captured by 
the model. 
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Figure 6. 5-centroid network performance on training set for Furnas plant 



Figure 7. 5-centroid network performance on test set for Furnas plant 



Figure 8. 5-centroid network performance on training set for Camargos 
plant 



Figure 9. 5-centroid network performance on test set for Camargos plant 


C. Network with 10 centroids 

We again notice a reduction in temporal delay between 
expected and predicted fluxes. Further, the network could 
capture another aspect of the chaotic dynamics: it emulated 
the inflexion noticeable in index 6 of the testing set for Furnas 
plant, even though with temporal delay. Quantitatively there 
was progress as well: the squared error in the test set was 
reduced in contrast to previous models. 



Figure 11. 10-centroid network performance on test set for Furnas plant 



Figure 12. 10-centroid network performance on training set for Camargos 
plant 



Figure 13. 10-centroid network performance on test set for Camargos 
plant 


D. Network with larger centroid counts 

The differences between using 15, 20 or 50 centroids are no 
longer noticeable, but we list remaining data for completeness 
(Fig. 14 to Fig. 25). 



Figure 14. 15-centroid network performance on training set for Furnas 
plant 



Figure 10. 10-centroid network performance on training set for Furnas Figure 15. 15-centroid network performance on test set for Furnas plant 
plant 
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Figure 16. 15-centroid network performance on training set for Camargos 
plant 
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Figure 17. 15-centroid network performance on test set for Camargos 
plant 



Figure 18. 20-centroid network performance on training set for Furnas 
plant 



Figure 19. 20-centroid network performance on test set for Furnas plant 
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Figure 20. 20-centroid network performance on training set for Camargos 
plant 



Figure 21. 20-centroid network performance on test set for Camargos 
plant 



Figure 22. 50-centroid network performance on training set for Furnas 
plant 



Figure 23. 50-centroid network performance on test set for Furnas plant 



Figure 24. 50-centroid network performance on training set for Camargos 
plant 



Figure 25. 50-centroid network performance on test set for Camargos 
plant 


E. Effects of centroid count 

To evaluate the effect of centroid count m on network 
performance, we employed the metric of squared error (12) 
on the test set for comparison. Using this metric, we searched 
the interval from 2 to 50 in an attempt to find the optimum 
value for the parameter, with results summarized by Fig. 26: 


i 

I 


Figure 26. Squared error on test set for various centroid counts 



Above 10, m has few influence on network generalizability. 

F. Networks with more delayed inputs 

We investigated another dimension of architectural 
variation: the quantity of network inputs. In other words, we 
changed the structure illustrated by Fig. 1 to increase the 
maximum delay of inputs to the network. 
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While varying t d (the maximum delay) and m (centroid 
count) simultaneously, we searched for the network with least 
mean squared error (squared error divided by quantity of 
samples) in the test set. The results are illustrated by Fig. 27. 


i 

I 

i 

i 

r 


Figure 27. Mean squared error on test set as a function of the quantity of 
delayed inputs for each power plant 

We observe that increasing maximum delay from 2 to 4 
increased performance, but the opposite happened with 
further increases of delay. Due to the noisy data, however, we 
conjecture that the apparent evolution brought about by 
varying t d does not generalize, i.e. it was verified for the test et 
but does not equate to a real improvement in network 
generalizability. The results presented here can be considered 
an alternative method to the one described in [15], where 
Multilayer Perceptrons are employed for the problem of flux 
prediction. 


r , \\. 


IV. Conclusion 

By using time series data about monthly flux measurements 
of two hydroelectric power plants in Grande river, in a 82 year 
interval, we employed RBF networks coupled with 
NARMAX methodology to predict future flux data. The 
network performance indicated that the model cannot 
replicate the highest frequencies of oscillation that 
characterize the real data, even though it can approximate the 
data in periods of low oscillation. 

Furthermore, the divergence between network and 
expected outputs seemed non-reducible through refining the 
model’s architecture. Therefore it is plausible that the 
investigated dynamical system requires more than just its past 
immediately-visible behavior for successful forecasting. 
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